Simulations TWFE

Vincent Bagilet https://vincentbagilet.github.io/ (Columbia University)https://www.columbia.edu/
November 4, 2021

Purpose of the document and summary

In this document, I run a simulation exercise to illustrate issues arising when estimating Two-Way Fixed Effect (TWFE) models with staggered and heterogeneous treatment.

Summary: if one only wants to use the data generated here and play with it without reading my whole spiel, they can use the function generate_data_TWFE. It takes many parameters but they are all pretty intuitive and described in the section “Modelisation choices” of this document. The other useful function is compute_simulation_TWFE. Taking similar parameters as generate_data_TWFE it generates the data and runs an estimation.

Packages

First, I load useful packages. Note that some packages are optional. If you do not want to install/use them, you may need to modify part of your code. mediocrethemes is my ggplot theme package. If you want to use it, you can find instructions here.

Show code
#necessary packages
library(tidyverse) 
library(fixest)
library(knitr) 
library(broom)
library(scales)

#Optional packages
library(mediocrethemes) 
library(tictoc) 
library(here)
library(beepr)

set.seed(1)

mediocrethemes::set_mediocre_all(pal = "coty", gradient = "left")

Building the example

To illustrate issues arising with TWFE, I build simple fake data, estimate a TWFE model on it and replicate the analysis several times.

Modelisation choices

To simplify, I consider the assumptions described below. Of course these assumptions are purely arbitrary and I invite you to play with them. Note that, fixed effects and the covariate are not necessary to the analysis. I only add them to make the analysis more realistic if necessary but I set their baseline values to 0.

More precisely, I set:

I also create a bunch of variables that can be useful:

Data generation

I write a simple function that generates the data. It takes as input the values of the different parameters and returns a data frame containing all the variables for this analysis.

Show code
generate_data_TWFE <- function(N_i,
                               N_t,
                               sigma_e,
                               p_treat,
                               staggered,
                               het_indiv,
                               het_time,
                               alpha,
                               beta,
                               mu_indiv_fe = 0, 
                               sigma_indiv_fe = 0,
                               mu_time_fe = 0, 
                               sigma_time_fe = 0,
                               mu_x = 0, 
                               sigma_x = 0,
                               gamma = 0
                             ) {

  if (!is.logical(staggered)) {stop("staggered must be logical")} 
  if (!(het_indiv %in% c("large_first", "random", "homogeneous"))) {
    stop('het_indiv must be either "large_first", "random" or "homogeneous"')
  } 
  if (!(het_time %in% c("constant", "linear"))) {
    stop('het_time must be either "constant" or "linear"')
  } 
  
  data <- tibble(indiv = 1:N_i) %>%
    mutate(in_treatment = (indiv %in% sample(1:N_i, floor(N_i*p_treat)))) %>% 
    crossing(t = 1:N_t) %>%
    group_by(indiv) %>%
    mutate(
      indiv_fe = rnorm(1, mu_indiv_fe, sigma_indiv_fe),
      t_event = ifelse(staggered, sample(2:(N_t - 1), 1), floor(N_t/2)), 
        #I use 2:(N_t-1) to have a pre and post period
      t_event = ifelse(in_treatment, t_event, NA),
      beta_i = case_when(
        het_indiv == "large_first" ~ N_t-t_event,
        het_indiv == "random" ~ runif(1, beta*0.5, beta*1.5), 
        het_indiv == "homogeneous" ~ beta
      ),
      beta_i = ifelse(is.na(t_event), 0, beta_i)
    ) %>%
    ungroup() %>%
    group_by(t) %>%
    mutate(time_fe = rnorm(1, mu_time_fe, sigma_time_fe)) %>%
    ungroup() %>%
    mutate(
      post = (t > t_event),
      treated = in_treatment & post, 
      beta_i = ifelse(
        het_time == "linear" & post & !is.na(t_event),
        beta_i*(t - t_event), 
        beta_i
      ),
      t_centered = t - t_event,
      x = rnorm(nrow(.), mu_x, sigma_x),
      e = rnorm(nrow(.), 0, sigma_e),
      y0 = alpha + gamma * x + indiv_fe + time_fe + e,
      y1 = y0 + beta_i,
      y = treated*y1 + (1 - treated)*y0
    )
  
  return(data)
}

I set baseline values for the parameters as very standard. These values are arbitrary.

Show code
baseline_parameters_TWFE <- tibble(
  N_i = 20,
  N_t = 50,
  sigma_e = 1,
  p_treat = 0.8,
  staggered = TRUE,
  het_indiv = "homogeneous",
  het_time = "constant",
  alpha = 1,
  beta = 1
)

Here is an example of data created with the data generating process and baseline parameter values, for 2 individuals and 8 time periods:

Show code
baseline_parameters_TWFE %>% 
  mutate(N_i = 2, N_t = 8) %>%
  pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
  select(indiv, t, y, in_treatment, post, treated, t_centered, e) %>% 
  kable()
indiv t y in_treatment post treated t_centered e
1 1 2.5952808 TRUE FALSE FALSE -4 1.5952808
1 2 1.3295078 TRUE FALSE FALSE -3 0.3295078
1 3 0.1795316 TRUE FALSE FALSE -2 -0.8204684
1 4 1.4874291 TRUE FALSE FALSE -1 0.4874291
1 5 1.7383247 TRUE FALSE FALSE 0 0.7383247
1 6 2.5757814 TRUE TRUE TRUE 1 0.5757814
1 7 1.6946116 TRUE TRUE TRUE 2 -0.3053884
1 8 3.5117812 TRUE TRUE TRUE 3 1.5117812
2 1 1.3898432 FALSE NA FALSE NA 0.3898432
2 2 0.3787594 FALSE NA FALSE NA -0.6212406
2 3 -1.2146999 FALSE NA FALSE NA -2.2146999
2 4 2.1249309 FALSE NA FALSE NA 1.1249309
2 5 0.9550664 FALSE NA FALSE NA -0.0449336
2 6 0.9838097 FALSE NA FALSE NA -0.0161903
2 7 1.9438362 FALSE NA FALSE NA 0.9438362
2 8 1.8212212 FALSE NA FALSE NA 0.8212212

Let’s now have a look at different types of treatment and treatment allocations. First, let’s look at treatment allocation mechanisms. The allocation can either be staggered or not, the treatment homogeneous across individual or not and cconstant or not in time.

Show code
labs_graph_staggered <- labs(
    title = "Treatement assignment across time and individuals",
    x = "Time index", 
    y = "Individual id", 
    fill = "Treated"
  )

baseline_parameters_TWFE %>% 
  mutate(staggered = FALSE) %>%
  pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
  ggplot(aes(x = t, y = factor(indiv), fill = factor(treated))) + 
  geom_tile(color = "white", lwd = 0.3, linetype = 1) +
  coord_fixed() +
  labs_graph_staggered + 
  labs(subtitle = "Non staggered")
Show code
baseline_parameters_TWFE %>% 
  mutate(staggered = TRUE) %>%
  pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
  ggplot(aes(x = t, y = factor(indiv), fill = factor(treated))) + 
  geom_tile(color = "white", lwd = 0.3, linetype = 1) +
  coord_fixed() +
  labs_graph_staggered + 
  labs(subtitle = "Staggered")

Now, let’s vary treatment effect size across individuals, considering a staggered adoption.

Show code
labs_graph_size <- labs(
    title = "Treatement effect size across time and individuals",
    x = "Time index", 
    y = "Individual id", 
    fill = "Treatment effect size"
  )

baseline_parameters_TWFE %>% 
  mutate(het_indiv = "homogeneous", het_time = "constant") %>%
  pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
  ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) + 
  geom_tile(color = "white", lwd = 0.3, linetype = 1) +
  coord_fixed() +
  labs_graph_size + 
  labs(subtitle = "Homogeneous treatment effect across individuals, constant in time")
Show code
baseline_parameters_TWFE %>% 
  mutate(het_indiv = "random", het_time = "constant") %>%
  pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
  ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) + 
  geom_tile(color = "white", lwd = 0.3, linetype = 1) +
  coord_fixed() +
  labs_graph_size + 
  labs(subtitle = "Random treatment effect size across individuals, constant in time")
Show code
baseline_parameters_TWFE %>% 
  mutate(het_indiv = "large_first", het_time = "constant") %>%
  pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
  ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) + 
  geom_tile(color = "white", lwd = 0.3, linetype = 1) +
  coord_fixed() +
  labs_graph_size + 
  labs(subtitle = "First treated have larger treatment effect, constant in time")

A last thing we can vary is that we can make individual effects increase linearly in time.

Show code
baseline_parameters_TWFE %>% 
  mutate(het_indiv = "homogeneous", het_time = "linear") %>%
  pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
  ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) + #treated*beta_i
  geom_tile(color = "white", lwd = 0.3, linetype = 1) +
  coord_fixed() +
  labs_graph_size +
  labs(subtitle = "Treatment effect increasing linearly in time")

One can now play with this function to generate their own data and run their own analyses. In the following sections, I try to run my own analyses.

Estimation

First, I write a function ,estimate_TWFE. to run an estimation of a simple distributed lag model, in a simple event study type of analysis. My knowledge on the topic is limited and I built this analysis quickly. My analysis is likely to be flawed. One should thus use it with caution. Regardless, this code could hopefully be used as a template for a more informed analysis.

Show code
estimate_TWFE <- function(data) {
  reg <- data %>% 
    mutate(
      indiv = as.factor(indiv),
      t = as.factor(t),
      treated = as.numeric(treated),
      in_treatment = as.numeric(in_treatment),
      t_centered = as.factor(t_centered)
    ) %>% 
    feols(
    data = ., 
    fml = y ~ in_treatment:t_centered | indiv + t
    ) %>% 
    broom::tidy() %>% 
    rename(p_value = p.value, se = std.error) %>% 
    mutate(term = as.numeric(str_remove_all(term, "in_treatment\\:t_centered"))) %>% 
    rename(lag = term) %>% 
    select(-statistic) %>% 
    suppressMessages() #Warning saying that NA values dropped and 
    #that one or two factors are removed due to colinearity
  
  return(reg)
}

Here is an example output for such a simulation (limited to the 15 first lags).

Show code
baseline_parameters_TWFE %>% 
  pmap_dfr(generate_data_TWFE) %>%
  estimate_TWFE() %>% 
  slice(1:15) %>% 
  kable()
lag estimate se p_value
-47 48.35164 72.51067 0.5150134
-46 48.52208 71.78158 0.5093493
-45 47.23598 70.96010 0.5157290
-44 47.40445 70.15433 0.5095047
-43 47.37527 69.55334 0.5061656
-42 44.98009 68.74585 0.5228276
-41 45.95534 67.93711 0.5090587
-40 44.75955 67.06675 0.5146613
-39 44.44709 66.48944 0.5139813
-38 43.86001 65.56009 0.5136580
-37 43.11749 65.02702 0.5173463
-36 42.05929 63.90543 0.5204164
-35 41.68371 63.25326 0.5198866
-34 41.07402 62.68442 0.5222292
-33 41.62932 61.67068 0.5099311

Run a whole simulation

To run a whole simulation, I create the function compute_simulation_TWFE. This simple function takes as input the various parameters aforementioned an returns a table with the estimate of the treatment, its p-value and standard error and all input parameters.

Before writing this function, I compute the true effect (the ATT) to add it to the output of the simulation.

Show code
compute_true_effect_TWFE <- function(data) {
  data %>% 
    filter(in_treatment) %>% 
    group_by(t_centered) %>% 
    summarise(true_effect = mean(treated*(y1 - y0))) %>% 
    rename(lag = t_centered)
}  

I can then compute the simulation.

Show code
compute_simulation_TWFE <- function(N_i,
                                    N_t,
                                    sigma_e,
                                    p_treat,
                                    staggered,
                                    het_indiv,
                                    het_time,
                                    alpha,
                                    beta,
                                    mu_indiv_fe = 0,
                                    sigma_indiv_fe = 0,
                                    mu_time_fe = 0,
                                    sigma_time_fe = 0,
                                    mu_x = 0,
                                    sigma_x = 0,
                                    gamma = 0) {
  data <- generate_data_TWFE(
    N_i = N_i,
    N_t = N_t,
    sigma_e = sigma_e,
    p_treat = p_treat,
    staggered = staggered,
    het_indiv = het_indiv,
    het_time = het_time,
    alpha = alpha,
    beta = beta,
    mu_indiv_fe = mu_indiv_fe,
    sigma_indiv_fe = sigma_indiv_fe,
    mu_time_fe = mu_time_fe,
    sigma_time_fe = sigma_time_fe,
    mu_x = mu_x,
    sigma_x = sigma_x,
    gamma = gamma
  ) 
  
  data %>%
    estimate_TWFE() %>%
    mutate(
      N_i = N_i,
      N_t = N_t,
      sigma_e = sigma_e,
      p_treat = p_treat,
      staggered = staggered,
      het_indiv = het_indiv,
      het_time = het_time,
      alpha = alpha,
      beta = beta,
      mu_indiv_fe = mu_indiv_fe,
      sigma_indiv_fe = sigma_indiv_fe,
      mu_time_fe = mu_time_fe,
      sigma_time_fe = sigma_time_fe,
      mu_x = mu_x,
      sigma_x = sigma_x,
      gamma = gamma
    ) %>% 
    left_join(compute_true_effect_TWFE(data), by = "lag")
} 

Here is the output of one simulation:

Show code
baseline_parameters_TWFE %>% 
  pmap_dfr(compute_simulation_TWFE) 
# A tibble: 92 x 21
     lag estimate    se p_value   N_i   N_t sigma_e p_treat staggered
   <dbl>    <dbl> <dbl>   <dbl> <dbl> <dbl>   <dbl>   <dbl> <lgl>    
 1   -45   -0.229  2.09   0.914    20    50       1     0.8 TRUE     
 2   -44   -1.03   2.01   0.617    20    50       1     0.8 TRUE     
 3   -43   -0.476  1.80   0.794    20    50       1     0.8 TRUE     
 4   -42   -2.26   1.81   0.231    20    50       1     0.8 TRUE     
 5   -41   -0.593  2.02   0.774    20    50       1     0.8 TRUE     
 6   -40   -0.376  2.03   0.856    20    50       1     0.8 TRUE     
 7   -39   -0.965  1.72   0.583    20    50       1     0.8 TRUE     
 8   -38   -2.04   1.94   0.309    20    50       1     0.8 TRUE     
 9   -37   -1.26   1.90   0.519    20    50       1     0.8 TRUE     
10   -36   -0.265  1.99   0.896    20    50       1     0.8 TRUE     
# … with 82 more rows, and 12 more variables: het_indiv <chr>,
#   het_time <chr>, alpha <dbl>, beta <dbl>, mu_indiv_fe <dbl>,
#   sigma_indiv_fe <dbl>, mu_time_fe <dbl>, sigma_time_fe <dbl>,
#   mu_x <dbl>, sigma_x <dbl>, gamma <dbl>, true_effect <dbl>

Analysis of the results

To analyze the results, I build a simple function to run the regression and graph the results. It takes as inputs the baseline parameters and our parameters of interest and return a graph of the lag-estimates.

Show code
graph_results <- function(baseline_parameters, 
                          staggered = TRUE, 
                          het_indiv = "homogeneous", 
                          het_time = "constant") {
  
  baseline_parameters["staggered"] <- staggered
  baseline_parameters["het_indiv"] <- het_indiv
  baseline_parameters["het_time"] <- het_time
  
  graph <- baseline_parameters %>%
    pmap_dfr(compute_simulation_TWFE) %>% 
    filter(dplyr::between(lag, -5, 5)) %>% 
    mutate(
      estimate_level = (estimate - estimate[which(lag == 0)]),
      true_effect_level = (true_effect - true_effect[which(lag == 0)]), 
      lag = as.integer(lag)
    ) %>% 
    ggplot(aes(x = lag, y = estimate_level)) +
    geom_point() +
    geom_point(aes(x = lag, y = true_effect_level), shape = 1) +
    scale_x_continuous(breaks = scales::pretty_breaks()) +
    geom_pointrange(aes(
      ymin = estimate_level-1.96*se,
      ymax = estimate_level+1.96*se)) +
    labs(
      x = "Lag",
      y = "Estimate (centered)",
      title = "Representation of estimates for each lag",
      subtitle = paste(
      ifelse(staggered, "Staggered,", "Non staggered,"),
      het_indiv, ",",
      het_time,
      "treatment"),
      caption = "Hollow points represent the centered true effect
      Vertical bars represent 95% confidence intervals"
    )
  
  return(graph)
}
Show code
baseline_parameters_TWFE %>% 
  graph_results(staggered = TRUE, het_indiv = "large_first", het_time = "linear") 
Show code
baseline_parameters_TWFE %>% 
  graph_results(staggered = TRUE, het_indiv = "random", het_time = "linear") 
Show code
baseline_parameters_TWFE %>% 
  graph_results(staggered = TRUE, het_indiv = "homogeneous", het_time = "linear") 
Show code
baseline_parameters_TWFE %>% 
  graph_results(staggered = TRUE, het_indiv = "large_first", het_time = "constant") 
Show code
baseline_parameters_TWFE %>% 
  graph_results(staggered = TRUE, het_indiv = "random", het_time = "constant") 
Show code
baseline_parameters_TWFE %>% 
  graph_results(staggered = TRUE, het_indiv = "homogeneous", het_time = "constant") 

Out of these simulations, we notice that in most cases, the two ways fixed effect model yields incorrect estimates. Further and more precise analysis should and could be ran here. Note that for some types of treatment, the variation is too limited to be able to precisely estimate the effect of interest.

Comparing performance

I then try to compare more systematically the performance of the TWFE model in the different settings. To do so I run a Monte-Carlo simulation, creating many independent data sets and running the event study on each of them. To do so, I then compute the pre-treatment bias, ie the presence of pre-trends, the post-treatment bias and RMSE for each simulation.

I run the simulations for different sets of parameters by mapping the compute_simulation_TWFE function on each set of parameters. I enclose these parameters into a table, param_TWFE. Note that in this table each set of parameters appears n_iter times as we want to run the analysis \(n_{iter}\) times for each set of parameters.

Show code
n_iter <- 100

param_TWFE <- baseline_parameters_TWFE %>% 
  select(-het_indiv, -het_time) %>% 
  crossing(
    # staggered = c(TRUE, FALSE),
    het_indiv = c("large_first", "random", "homogeneous"),
    het_time = c("constant", "linear")
  ) %>% 
  crossing(rep_id = 1:n_iter) %>% 
  select(-rep_id)

I can now run the simulations by mapping the compute_simulation_TWFE function on param_TWFE.

Show code
tic()
simulations_TWFE <- pmap_dfr(param_TWFE, compute_simulation_TWFE)
beep()
toc()

# saveRDS(simulations_TWFE, here("TWFE/Outputs/simulations_TWFE.RDS"))

I want to compute the summary variables of interest (average pre-treatment bias, post-treatment bias , RMSE). To make bias comparable across simulations, I normalize it by dividing it by the mean of the true effects. This normalization method is somehow arbitrary. To run these analyses, I only consider -5 to +5 lags around the treatment time.

Show code
simulations_TWFE <- readRDS(here("TWFE/Outputs/simulations_TWFE.RDS"))

summarise_simulations <- function(data) {
  data %>%
    filter(dplyr::between(lag, -5, 5)) %>% 
    mutate(mean_true_effect = mean(true_effect, na.rm = TRUE)) %>% 
    group_by(across(c(-estimate, -se, -p_value, -true_effect))) %>% 
    #group by basically all the DGP parameters
    summarise(
      bias = mean((estimate - true_effect)/mean_true_effect, na.rm = TRUE),
      rmse = sqrt(mean(((estimate - true_effect)/mean_true_effect)^2)), 
      .groups  = "drop"
    ) %>% 
    mutate(post = ifelse(lag >= 0, "post", "pre")) %>% 
    group_by(across(c(-lag, -bias, -rmse))) %>%
    summarise(
      bias = mean(bias), 
      rmse = mean(rmse),
      .groups  = "drop"
    ) %>% 
    pivot_wider(
      names_from = post, 
      values_from = c(bias, rmse)
    ) %>% 
    select(-rmse_pre)
}

summary_simulations_TWFE <- summarise_simulations(simulations_TWFE)
# saveRDS(summary_simulations_TWFE, here("TWFE/Outputs/summary_simulations_TWFE.RDS"))

Then, we quickly graph these results to analyze them. Note that I only varied 2 parameters across simulations, het_indiv and het_time.

Show code
summary_simulations_TWFE %>% 
  select(het_indiv, het_time, bias_post, bias_pre, rmse_post) %>% 
  pivot_longer(cols = c(bias_post, bias_pre, rmse_post), names_to = "statistics") %>% 
  mutate(
    statistics = str_to_title(str_replace(statistics, "_", " ")),
    het_indiv = str_to_title(str_replace(het_indiv, "_", " ")),
    het_indiv = paste("Indiv:", het_indiv),
    het_time = str_to_title(str_replace(het_time, "_", " ")),
    het_time = paste("Time:", het_time),
  ) %>% 
  # filter(statistics != "rmse_post") %>% 
  ggplot(aes(x = "", y = value, fill = statistics)) +
  geom_col(position = "dodge") +
  facet_grid(het_indiv ~ het_time) + 
  labs(
    title = "Comparison of different statistics across different treatment types",
    x = "",
    y = "Mean value for each statistic",
    fill = ""
  )

One can notice that the TWFE issue discussed here is much more prevalent in the case of effects that are non constant in time and that increase in time. As discussed in the literature, in cases with non constant treatment effects in time and where treatment effect are larger for units treated first (an rather common setting), TWFE effects may yield highly incorrect values.